Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24891][SQL] Fix HandleNullInputsForUDF rule #21851

Closed
wants to merge 2 commits into from

Conversation

maryannxue
Copy link
Contributor

@maryannxue maryannxue commented Jul 23, 2018

What changes were proposed in this pull request?

The HandleNullInputsForUDF would always add a new If node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under an If node, because clearly the UDF will not be called when any of those arguments is null.

How was this patch tested?

Add new tests under sql/UDFSuite and AnalysisSuite.

// branch of `If` will be called if any of these checked inputs is null. Thus we can
// prevent this rule from being applied repeatedly.
val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) =>
if (needsNullCheck(cls, expr)) AssertNotNull(expr) else expr }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us introduce KnownNotNull instead of using AssertNotNull, which has a side-effect?

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93459 has finished for PR 21851 at commit 62fa9cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93469 has finished for PR 21851 at commit b499b97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

update the PR description?

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93503 has finished for PR 21851 at commit b499b97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93514 has finished for PR 21851 at commit b499b97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class KnowNotNull(child: Expression) extends UnaryExpression

@gatorsmile
Copy link
Member

gatorsmile commented Jul 25, 2018

LGTM

Thanks! Merged to master/2.3

@asfgit asfgit closed this in c26b092 Jul 25, 2018
asfgit pushed a commit that referenced this pull request Jul 25, 2018
The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under an `If` node, because clearly the UDF will not be called when any of those arguments is null.

Add new tests under sql/UDFSuite and AnalysisSuite.

Author: maryannxue <[email protected]>

Closes #21851 from maryannxue/spark-24891.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants